adaptive auxiliary task weighting
Adaptive Auxiliary Task Weighting for Reinforcement Learning
Reinforcement learning is known to be sample inefficient, preventing its application to many real-world problems, especially with high dimensional observations like images. Transferring knowledge from other auxiliary tasks is a powerful tool for improving the learning efficiency. However, the usage of auxiliary tasks has been limited so far due to the difficulty in selecting and combining different auxiliary tasks. In this work, we propose a principled online learning algorithm that dynamically combines different auxiliary tasks to speed up training for reinforcement learning. Our method is based on the idea that auxiliary tasks should provide gradient directions that, in the long term, help to decrease the loss of the main task. We show in various environments that our algorithm can effectively combine a variety of different auxiliary tasks and achieves significant speedup compared to previous heuristic approches of adapting auxiliary task weights.
Reviews: Adaptive Auxiliary Task Weighting for Reinforcement Learning
I think the results are much more comprehensive now. I raised my score accordingly. If I understand the main idea correctly, the proposed method can be interpreted as a gradient-based meta-learning method (e.g., MAML) in that the algorithm finds the gradient of the main objective by taking into account the parameter update procedure. It would be good to provide this perspective and also review the relevant work on meta-gradients for RL (e.g., MAML [Finn et al.], Meta-gradient RL [Xu et al.], Learning intrinsic reward [Zheng et al.]). Nevertheless, I think this is a novel application of meta-gradient for tuning auxiliary task weights.
Reviews: Adaptive Auxiliary Task Weighting for Reinforcement Learning
Reviewers agreed that the paper addresses an important problem in current deep RL research and appreciated the effort put into the rebuttal by the authors. New experiments using the 0/1 reward formulation and a comparison to fixed hand-tuned hyper parameters addressed two of the main concerns raised by reviewers. In the end all three reviewers recommended accepting the paper.
Adaptive Auxiliary Task Weighting for Reinforcement Learning
Reinforcement learning is known to be sample inefficient, preventing its application to many real-world problems, especially with high dimensional observations like images. Transferring knowledge from other auxiliary tasks is a powerful tool for improving the learning efficiency. However, the usage of auxiliary tasks has been limited so far due to the difficulty in selecting and combining different auxiliary tasks. In this work, we propose a principled online learning algorithm that dynamically combines different auxiliary tasks to speed up training for reinforcement learning. Our method is based on the idea that auxiliary tasks should provide gradient directions that, in the long term, help to decrease the loss of the main task.
Adaptive Auxiliary Task Weighting for Reinforcement Learning
Lin, Xingyu, Baweja, Harjatin, Kantor, George, Held, David
Reinforcement learning is known to be sample inefficient, preventing its application to many real-world problems, especially with high dimensional observations like images. Transferring knowledge from other auxiliary tasks is a powerful tool for improving the learning efficiency. However, the usage of auxiliary tasks has been limited so far due to the difficulty in selecting and combining different auxiliary tasks. In this work, we propose a principled online learning algorithm that dynamically combines different auxiliary tasks to speed up training for reinforcement learning. Our method is based on the idea that auxiliary tasks should provide gradient directions that, in the long term, help to decrease the loss of the main task. We show in various environments that our algorithm can effectively combine a variety of different auxiliary tasks and achieves significant speedup compared to previous heuristic approches of adapting auxiliary task weights.